Pattern Recognition Applied To The Acquisition Of A Grammatical Classification System From Unrestricted English Text
نویسندگان
چکیده
Within computational linguistics, the use of statistical pattern matching is generally restricted to speech processing. We have attempted to apply statistical techniques to discover a grammatical classification system from a Corpus of 'raw' English text. A discovery procedure is simpler for a simpler language model; we assume a first-order Markov model, which (surprisingly) is shown elsewhere to be sufficient for practical applications. The extraction of the parameters of a standard Markov model is theoretically straightforward; however, the huge size of the standard model for a Natural Language renders it incomputahle in reasonable time. We have explored various constrained models to reduce computation, which have yielded results of varying success. Pattern recognition and NLP In the area of language-related computational research, there is a perceived dichotomy between, on the one hand, "Natural Language" research dealing principally with syntactic and other analysis of typed text, and on the other hand, "Speech Processing" research dealing with synthesis, recognition, and understanding of speech signals. This distinction is nut based merely on a difference of input and/or output media, but seems also to correlate to noticeable differences in assumptions and techniques used in research. One example is in the use of statistical pattern recognition techniques: these are used in a wide variety of computerbased research areas, and many speech researchers take it for granted that such methods are part of their stock in trade. In contrast, statistical pattern recognition is hardly ever even considered as a technique to be used in "Natural Language" text analysis. One reason for this is that speech researchers deal with "real", "unrestricted" data (speech samples), whereas much NLP research deals with highly restricted language data, such as examples intuited by theoreticians, or simplified English as allowed by a dialogue system, sach as a Natural Language Database Query system. Chomsky (57) did much to discredit the use of representative text samples or Corpora in syntactic research; he dismissed both statistics and semantics as being of no use to syntacticians: "Despite the undeniable interest and importance of semantic and statistical studies of language, they appear to have no direct relevance to the problem of determining or characterizing the set of grammatical utterances" (Chomsky 57 p.17). Subsequent research in Computational Linguistics has shown that Semantics is far more relevant and important than Chomsky gave credit for. Phenomenal advances in computer power and capabilities mean that we can now try statistical pattern recognition techniques which would have been incomputable in Chomsky's early days. Therefore, we felt that the case for Corpus-based statistical Pattern Recognition techniques should be reopened. Specifically, we have investigated the possibility of using Pattern Recognition techniques for the acquisition of a grammatical classification system from Unrestricted English text.
منابع مشابه
Pati'ern Recognition Applied to the Acquisition of a Grammatical Classification System from Unrestricted English Text
Within computational linguistics, the use of statistical pattern matching is generally restricted to speech processing. We have attempted to apply statistical techniques to discover a grammatical classification system from a Corpus of 'raw' English text. A discovery procedure is simpler for a simpler language model; we assume a first-order Markov model, which (surprisingly) is shown elsewhere t...
متن کاملNeural Network Based Recognition System Integrating Feature Extraction and Classification for English Handwritten
Handwriting recognition has been one of the active and challenging research areas in the field of image processing and pattern recognition. It has numerous applications that includes, reading aid for blind, bank cheques and conversion of any hand written document into structural text form. Neural Network (NN) with its inherent learning ability offers promising solutions for handwritten characte...
متن کاملLevel of Grammatical Proficiency and Acquisition of Functional Projections: The case of Iranian learners of English language
Unlike Lexical Projections, Functional Projections (Extended Projections) are more of an ‘abstract’ in nature. Therefore, Functional Projections seem to be acquired later than Lexical Projections by the L2 learners. The present study investigates Iranian L2 learners’ acquisition of English Extended Projections taking into account their level of grammatical proficiency. Specifically, the aim is ...
متن کاملReassembling Formal Features in Articles by L1 Persian Learners of L2 English
There has been considerable debate over what the sources of morphological variation in second language acquisition are. From among various hypotheses put forth on the topic, the feature reassembly hypothesis (Lardiere, 2005) assumes that it is the reconfiguration of features in the L2 which causes variation between the performance of natives and non-natives. Acknowle...
متن کاملA Comparative Study of Nominalization in an English Applied Linguistics Textbook and its Persian Translation
Among the linguistic resources for creating grammatical metaphor, nominalization rewords processes and properties metaphorically as nouns within the experiential metafunction of language. Following Halliday's (1998a) classification of grammatical metaphor, the current study investigated nominalization exploited in an English applied linguistics textbook and its corresponding Persian translati...
متن کامل